Quality Assessment of Linked Datasets Using Probabilistic Approximation

نویسندگان

  • Jeremy Debattista
  • Santiago Londoño
  • Christoph Lange
  • Sören Auer
چکیده

With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximation Methods for Solving the Equitable Location Problem with Probabilistic Customer Behavior

Location-allocation of facilities in service systems is an essential factor of their performance. One of the considerable situations which less addressed in the relevant literature is to balance service among customers in addition to minimize location-allocation costs. This is an important issue, especially in the public sector. Reviewing the recent researches in this field shows that most of t...

متن کامل

Luzzu Quality Metric Language - A DSL for Linked Data Quality Assessment

The steadily growing number of linked open datasets brought about a number of reservations amongst data consumers with regard to the datasets’ quality. Quality assessment requires significant effort and consideration, including the definition of data quality metrics and a process to assess datasets based on these definitions. Luzzu is a quality assessment framework for linked data that allows d...

متن کامل

Improving Curated Web-Data Quality with Structured Harvesting and Assessment

This paper describes a semi-automated process, framework and tools for harvesting, assessing, improving and maintaining high-quality linked-data. The framework, known as DaCura1, provides dataset curators, who may not be knowledge engineers, with tools to collect and curate evolving linked data datasets that maintain quality over time. The framework encompasses a novel process, workflow and arc...

متن کامل

Assessing Quantity and Quality of Links Between Link Data Datasets

The Linked Data Web is growing and it becomes increasingly necessary to analyze the relationship between datasets to exploit its full value. LOD datasets can range from datasets with low cohesion – containing data from different Fully Qualified Domain Names (FQDN) and namespaces – to highly cohesive datasets. This paper evaluates the quantity and quality of links between distributions, datasets...

متن کامل

Probabilistic Seismic Hazard Assessment of Tehran Based on Arias Intensity

A probabilistic seismic hazard assessment in terms of Arias intensity is presented for the city of Tehran. Tehran is the capital and the most populated city of Iran. From economical, political and social points of view, Tehran is the most significant city of Iran. Many destructive earthquakes happened in Iran in the last centuries. Historical references indicate that the old city of Rey and the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015